Reproducibility in Science

👤Ondřej Mottl

Science School of Quantitative Ecology 2025

bit.ly/SSoQE

This presentation


Research cycle

Reproducibility crisis

Evolution

Discussion


What is Open Science to You?

03:00

Open Science

A better view

The Journey

Reproducibility

Reproducibility ?


Uknown author

I was able to reproduce my results on my computer/machine

Can I consider my work reproducible?

Reproducibility

Time to change

It is a spectrum

Discussion


How reproducible is your research?

What can you do to improve it?

05:00

It is a spectrum

PROJECTS

Making a paper (compendium)



Project-oriented structure

Each paper is a single project

Project communication plan

Project structure

Project structure

# .
# └─my_awesome_project/
#   ├─ Data/
#   ├─ Outputs/
#   ├─ R/
#   ├─ README.md
#   ├─ LICENSE
#   └─ [project name].Rproj

Project structure

# .
# └─my_awesome_project/
#   ├─ Data/
#   |   ├─ Input/
#   |   ├─ Processed/
#   |   └─ Temp/
#   ├─ Outputs/
#   |   ├─ Data/
#   |   ├─ Figures/
#   |   └─ Tables/
#   ├─ R/
#   |   ├─ ___Init_project___.R
#   |   ├─ 00_Config_file.R
#   |   ├─ 01_Data_processing/
#   |   ├─ 02_Main_analyses/
#   |   ├─ 03_Supplementary_analyses/
#   |   ├─ Functions/
#   |       └─ example_fc.R
#   ├─ README.md
#   ├─ LICENSE
#   └─ [project name].Rproj

Project structure

# .
# └─my_awesome_project/
#   ├─ Data/
#   ├─ Outputs/
#   ├─ R/
#   |   ├─ ___Init_project___.R
#   |   ├─ 00_Config_file.R
#   |   ├─ 00_Master.R
#   |   ├─ 01_Data_processing/
#   |   |   ├─ 01_Prepare_pollen_data.R
#   |   |   └─ 01_Download_terrain_data.R
#   |   ├─ 02_Main_analyses/
#   |   |   ├─ 01_Vegetation_history/
#   |   |   |   ├─ 01_Estimate_dissimilarty.R
#   |   |   |   └─ 02_Summarise_dissimilarity.R
#   |   |   ├─ 02_Rate_of_change/
#   |   |   |   ├─ 01_Roc_estimation.R
#   |   |   |   ├─ 02_Roc_interpolation.R
#   |   |   ├─ 03_Temporal_patters_of_groups/
#   |   |   |   ├─ 01_Define_groups.R
#   |   |   |   ├─ 02_Temporal_patterns_of_groups.R
#   |   |   └─ 04_Visualisation/
#   |   ├─ 03_Supplementary_analyses/
#   |   └─ Functions/
#   ├─ README.md
#   ├─ LICENSE
#   └─ [project name].Rproj

Discussion


  • What is your current project structure?
  • What benefits do you think it has?

03:00

RStudio Projects


R-studio is using Projects already by default

Self-containedness

Working Directory



@JennyBryan:

If the first line of your R script is

setwd("C:\Users\jenny\path\that\only\I\have")

I will come into your office and SET YOUR COMPUTER ON FIRE 🔥.

The {here} package is up for rescue!

Self-containedness

R needs to have the environment set up correctly to run the code!

This includes the packages used in the code.

Self-containedness




Will my code run on your machine?

Will my code run in 10 years?

The {renv} package will make sure it will!

Note on practical exercises

Practical Exercise


  • Install the {renv} package
  • Make a new project and initialize {renv} with renv::init()
  • Create a snapshot with renv::snapshot()

15:00

Self-containedness

More advanced!

What if we share the whole Operating System as well?

Data Science

Why R for Data Science?

R is specifically designed for data science:

  • Import data from various sources
  • Tidy data into consistent structure
  • Transform data for analysis
  • Visualize patterns and relationships
  • Model to understand and predict
  • Communicate results effectively

All wrapped in Programming for reproducibility!

R for Data Science

Discussion


05:00

{tidyverse}

My suggestion: Let’s embrace the {tidyverse}!

Data wrangling

Data wrangling

Code Exercise

GOAL: Learn basic {tidyverse} principles and functions.


Materials:

15:00

Key Takeaways 🎯

Why Tidyverse?

  1. Readability: Code reads like English, making it easier to understand and maintain
  2. Consistency: All functions follow the same design principles
  3. Pipes: Chain operations together logically
  4. Data-first: All functions take data as the first argument (pipe-friendly)
  5. Tidy data: Encourage proper data structure for analysis

Immutability

Do not edit raw data!

Have a record of all changes to raw data!

Best practice:

  • Data/Input/Raw/
  • Data wrangling in code
  • Save processed data (Data/Processed/)

Immutability




  1. Do not overwrite R objects!
    • make a new object instead
  2. Do not overwrite data columns!
    • make a new column instead

Code Exercise

GOAL: Learn basic Immutability principles.


Materials:

15:00

The Immutability Checklist

✅ File Level:

✅ Object Level:

✅ Column Level:

Summary of Data Science

Use {tidyverse} !

Embrace Immutability !

🚀 Practical Outcomes:

  • Debugging: Can isolate exactly where problems occur
  • Peer Review: Reviewers can trace your analytical logic step-by-step
  • Future You: You’ll understand your analysis months later
  • Collaboration: Others can build on and verify your work
  • Extensions: Easy to modify analysis without starting over

Outro

SPROuT

The materials used in this presentation are further expanded in a greater detail in course Science Powered through Reproducibility, Openness, and Teamwork (SPRouT) taught at Charles University.

About me

Ondřej Mottl Assistant Professor at Charles University

Head of the 🧑‍💻 Laboratory of Quantitative Ecology